Goto

Collaborating Authors

 loss curve







SupplementaryMaterial: RobustOptimalTransport withApplicationsinGenerativeModelingand DomainAdaptation 1 Proofs

Neural Information Processing Systems

Y The constraint P X,P Y Prob(X) states that P X and P Y are valid probability distributions. For brevity, we shall ignore explicitly stating it in the rest of the proof. The above equation is similar in spirit to the Kantrovich-Rubinstein duality. An important observation to note is that the above optimization only maximizes over a single discriminator function (as opposed to two functions in optimization (2)). Hence, it is easier to train it in large-scale deep learningproblemssuchasGANs.




PredictingTrainingTimeWithoutTraining SupplementaryMaterial

Neural Information Processing Systems

In both cases we observe that the predicted curve is reasonably close to the actual curve, more so at the beginning of the training (which is expected, sincethelinearapproximation ismorelikelytohold). Point-wise similarity of predicted and observed loss curve. Up to now we focused on prediction error rates (see e.g. We started defining training time as the first time the (smoothed) loss is belowagiventhreshold(whichwethennormalizedw.r.t. In Section 4we suggest that, in the case of MSE loss, itispossible to predict the training time on alargedataset using asubset ofthesamples. However,sinceourtraining time definition measures the time to reach the asymptotic value (which is what is most useful in practice) rather than the time reach an absolute threshold, this does not affect the accuracy of the prediction(seeAppendixC).